Real-Time with Windows NT

Windows NT has been designed from the ground up to be a highly responsive, general-purpose operating system. To the real-time developer, this implies that there are some areas where Windows NT will not be suitable for real-time applications as a result of basic design choices made in its architecture. Topics of interest in real-time systems include:

Paradoxically, many of these design choices made within Windows NT actually result in a high level of responsiveness. This paper will discuss these important real-time topics in terms of the capabilities offered by Windows NT. For reference, there is a section in this paper that provides a high-level discussion of the Windows NT architecture.

Responding To External Events

Real-time applications are designed to respond to external events within a specified time interval. Windows NT offers strong capabilities in the areas of both interrupt management and I/O management.

Interrupts
Real-time applications use interrupts as a way of ensuring that external events are noticed by the operating system. It is critical that interrupts be handled promptly, according to their relative priority.

Within Windows NT, the kernel and the Hardware Abstraction Layer (HAL) are tuned to optimize interrupt delivery and event dispatching. The kernel provides interrupt dispatching to the rest of the system. The kernel can operate at one of thirty-two possible interrupt levels as shown in the following table; these levels help to prioritize the tasks that must be accomplished before other, less time-critical work. The kernel reserves eight interrupt levels for its own use. The remaining twenty-four interrupt levels are mapped onto hardware interrupts using the HAL.






Interrupt      Definition







Level 31       Hardware error interrupt







Level 30Powerfail interrupt







Level 29       Inter-processor interrupt







Level 28       Clock interrupt







Levels 12-27   These levels map to the traditional 







               interrupt levels 0-15 used in PCs







Levels 4-11    These levels are not generally used







Level 3        Software debugger interrupt







Levels 0-2     Reserved for software-only interrupts 







               to prioritize work within device drivers 







               and executive components







Windows NT handles interrupts on a preemptive basis; when an interrupt occurs, all execution at lower interrupt levels is suspended and execution begins immediately on the highest-level request. Processing continues until the highest-level process has been completed. This places a responsibility on device drivers in that system responsiveness is directly related to how quickly a device driver exits its interrupt routine.

Another way to state this is that Windows NT offers applications a multilevel interrupt mask. Higher priority interrupts can occur when the interrupt mask allows them to occur. Changing the interrupt mask raises the level so that lower level interrupts can not use system resources until the handling routing for the higher level interrupt has been completed.

Multiprocessor systems
Windows NT is designed for multi-processor systems. When an interrupt is dispatched, the kernel dispatches the interrupt to just one of the processors in the system. All other processors continue executing uninterrupted. Interrupts can be handled on any of the processors in a machine; this allows interrupts to be handled by idle processors, rather than concentrating the load on a single processor. Use of multiprocessor systems can offer significant benefits for real-time applications.

Asynchronous I/O
Asynchronous I/O is a very powerful mechanism for user-level real- time applications; the application can queue I/O and continue processing without having to either wait or respond immediately to some end-of-I/O event. Additionally, there are completion mechanisms in the I/O system (completion port I/O) that efficiently use the kernel synchronization and executive scheduling capabilities to distribute I/O completion processing to the most recently busy thread. This assures that cache is not invalidated and that the system makes efficient use of the processing power available to it. This can pay enormous dividends on multi-processor systems and have no appreciable overhead on single-processor systems.

In many cases (such as a Win32 application), asynchronous I/O may not be important and the application will wait for the I/O to complete before returning. However, in the case where the user (or kernel component) wishes to do work while the asynchronous I/O is completing, they can specify that they do not wish to wait for the request to complete and can continue working in the rest of the application. When the asynchronous I/O eventually completes, an event or some other notification mechanism will fire. The application can check for this completion event at some future time when it is convenient to do so within the application.

Device Drivers
Device drivers are very important to real-time users of Windows NT. In particular, processing in a device driver will proceed to completion without any interruptions, which is something that many real-time applications want. In order to get this kind of performance, however, the device driver code must be extremely solid. Windows NT device drivers run entirely within the system process and have access to all hardware through the HAL. A typical device driver will have several components as described in the following table.






Component               Description








Initialization Routine  This routine initializes hardware 







                        and sets up data structures used by      







                        the driver at startup time








Interrupt Service       This routine handles an interrupt on







Routine (ISR)	         the device that the device driver







                        controls 








Deferred Processing Call One or more DPCs handle non-time-







(DPC)                    critical processing for the driver








System Thread            Some, but not all, drivers will have 







                         a system thread, which is for very  







                         low priority work







When a device driver starts, the initialization routine will typically make the driver known to the system, register some entry points, and register an ISR. The device driver will wait, consuming only memory resources, until an interrupt occurs that meets the criteria of the driver's ISR; the driver's ISR is then entered. The driver will not be interrupted until the end of its interrupt service routine unless a higher level-interrupt occurs. Unlike other operating systems, an ISR on Windows NT can be interrupted by another ISR with higher priority; this is one reason that interrupt latency is hard to define for Windows NT.

When a driver is in its interrupt service routine, it should perform the minimum processing necessary to handle the interrupt, save the state necessary for processing the interrupt, queue a DPC routine for later processing that is not time-critical, and return. The DPC will occur at some later time╤although it may occur immediately after leaving the interrupt service routine if the system is not very busy. DPCs will run to the exclusion of all other processing (other than ISRs) until the DPC exits. Most device driver processing is done in this deferred processing routine or at even lower priority routines queued by this DPC. A number of important rules apply to DPCs. The most important rule is that a DPC cannot wait or lock up the system. Also important is that the DPC must have all memory it accesses locked down in physical memory so that it cannot incur page faults. It should be possible, using the support routines and driver model provided by Windows NT, to write device drivers that handle even the most complex and high speed data acquisition hardware.

Priorities And Scheduling

Real-time applications, by definition, have a time component associated with their behavior. In this context, it is important to understand how Windows NT assigns priorities to applications and schedules their execution. This section also discusses several other elements of the operating system and how their use can affect real-time applications.

Process priority
Within Windows NT, user applications are defined as processes. Windows NT is a pre-emptive, multi-tasking operating system that allows multiple processes (i.e., applications) to run within the system at the same time. A process has a number of properties that are associated with it. For real-time applications, one of the most important properties is the priority class (such as real_time) that defines the basic priority at which the application will run. The priority model within Windows NT includes 32 priority levels of which 16 are reserved for the operating system and real-time processes. Note that priority levels are different from the dispatch interrupt levels discussed in the kernel section. User applications almost always run at interrupt level 0, regardless of the priority level they are set to.

Each process maintains a private address space to ensure that it will not interfere with other processes. Each process has a base priority class. As shown at left, real-time applications can run with a base priority class of 31 (highest priority), 24, and 16. Typically, real-time processes will run at priority 24. Other applications (dynamic classes) have base priority class of 15, 13, 9 (normal foreground process), 7, 4, 1, and 0.

Each process also has associated with it, within the same address space, one or more threads where each thread represents an independent portion of that process. The number of threads is limited only by available memory and resources. The properties associated with the process, including the priority level, are inherited by these threads.

Each thread has a current priority that is derived from the process' priority class; it may vary upward and downward within defined limits using an API call that can vary up or down from the process' base priority. For example, a process running at real_time class 24 can have threads that run anywhere between classes 26-22 depending on their own independent priority. These threads will always stay within the real_time priority class.

Threads are independently scheduled by the executive. A process has associated with it a quantum, which is the maximum amount of time one of these threads can execute before the system checks to see if other threads with the same priority in the system want to execute. In general, real-time processes will have priority over almost all other activities or system events. However, for processes in the spectrum of dynamic classes that are running at lower priority levels, a number of events within the system, such as I/O completion, can cause a temporary priority boost for a thread, giving it priority within a process.

Finally, there is a single system process, within which there can be multiple system threads running. This system process runs all device drivers, the kernel, the executive, and device drivers. All of these components share a single address space, called "system space". A device driver, executive component, or the kernel can create a new system thread at any time╤these threads can be used to do work in the context of the system process. This technique of running a thread within the context of the system, where it has direct access through the HAL to device hardware might be of interest to real-time engineers.

Memory management
Memory management is another area in which many real-time engineers are interested. Windows NT is built around a virtual memory system. For real-time applications, Windows NT solves many of the problems that face real-time developers using more traditional virtual memory systems. First, paging I/O occurs at a lower priority level than the real-time priority process levels. Paging within the real-time process is still free to occur but this really ensures that background virtual memory management won't interfere with processing at real-time priorities.

Second, Windows NT permits an application to lock itself into memory so that paging within its own process does not affect it. This allows even very large processes (such as raster image processing where some processes are over 100MB in size) to lock all of their memory down into physical memory and avoid the overhead of paging, while allowing the rest of the system to function normally.

Finally, Windows NT memory management allows memory mapping which permits multiple processes, even device drivers and user applications, to share the same physical memory. This results in very fast data transfers between cooperating processes or between a driver and an application. Memory mapping can be used to dramatically enhance real-time performance.

Cache management
Cache management is one of the drawbacks of using a general purpose operating system such as Windows NT for real-time applications. Memory caching is a technique that uses a small amount of high-speed memory to hold the most recently used code or data. If the next instruction or piece of data is not in the cache, the CPU retrieves it from the slower main memory. Using a cache results in the best average system performance for an operating system, but it does introduce an element of timing unpredictability in real-time environments.

Synchronization Requirements

One of the most difficult tasks of real-time systems is ensuring that different threads and processes stay synchronized. That is, within a real-time application, the timing at which different activities occur is important. For example, if one part of the application completes before a second part gets the most current data, then the process that the application is monitoring may become unstable. Synchronization results from ensuring that application components are prioritized properly.

Kernel Synchronization
Most of the work in the kernel is performed at the highest software interrupt level (known as dispatch_level) or above. The kernel's job consists primarily of synchronization of execution on multiple processors, dispatching, and system database maintenance; it does very little work that is not a direct consequence of a request by a user or subsystem.

The kernel also has a rich set of dispatch objects; these objects synchronize execution within device drivers and Windows NT executive components. Included in this set of dispatch objects are various timers, events, mutexes and semaphores. These objects can all be used in a number of ways to synchronize execution as necessary within the Windows NT executive and kernel. These objects are also used by subsystems to implement the synchronization primitives exported to user applications.

Timers
With general purpose operating systems that use virtual memory and caching algorithms, it is often difficult to ensure that events can take place within specified periods of time.

Windows NT offers several timers that can be used to obtain more deterministic time intervals for managing events in real-time environments. These timers generate software interrupts from the kernel. With Windows NT Workstation 3.5, applications can use the basic system timer with the GetTickCount() API. The resolution of this timer is 10 milliseconds. Several CPUs support a high-resolution counter that can be used to get very granular resolution. The Win32 API called QueryPerformanceCounter() returns the resolution of a high-resolution performance counter. For Intel®-based CPUs, the resolution is about 0.8 microseconds. For MIPS-based CPUs, the resolution is about twice the clock speed of the processor. You need to call QueryPerformanceFrequency() to get the frequency of the high-resolution performance counter.

Spinlocks
Another method that ensures proper synchronization is a spinlock. A spinlock is a locking mechanism associated with a global data structure that ensures that only one thread can get access to that data at any one time. Once the first thread is done, it releases the spinlock so that other threads can then get access to that data. Within Windows NT, spinlocks are often used by device drivers in order to ensure that device registers or other data structures can be accessed by only one device driver at a time. Real-time applications can use spinlocks to synchronize timing events during an interrupt response or other similar activity.

Deterministic Response Times

With real-time systems, it is important to understand how quickly the operating system can respond to external events. The more deterministic the operating system can be, the more suitable the system will be for real-time applications.

Latency
To process an interrupt, three steps are generally taken. First, is the hardware interrupt latency. This represents the time that it takes for the CPU to finish processing the current instruction, flush the instruction pipeline, read the interrupt vector, locate the address of the Windows NT trap handler, and jump to that address. Second, the trap handler records the current machine state and creates a trap frame that records the execution state of the thread that was interrupted including program counters, registers, and other information. At this point, the trap handler starts an interrupt dispatcher which determines the source of the interrupt and then transfers control to an external routine, called an Interrupt Service Routine (ISR), or to an internal kernel routine. The ISR is provided by the device driver for the particular device that caused the interrupt. Finally, at this point, the ISR starts an I/O transfer to or from the device and executes other threads while the device completes the transfer. When the transfer is complete, the device again interrupts the CPU for service. Frequently, in real-time environments, latency refers to the total time that it takes for these steps to occur╤that is, the amount of time that it takes for the CPU to acknowledge and handle an interrupt.

Sample measurements
In a recent paper delivered at the 1995 Digital Communications Design Conference, the ability for Windows NT to handle real-time activities was measured. These measurements were designed to understand the appropriateness of using Windows NT as a platform for a TCP/IP router.








Measurement 	                 Duration







Hardware Interrupt Latency       1.8 - 2.9 microseconds







Interrupt Dispatching            4.6 - 10.5 microseconds







Interrupt Service Routine Length 10.3 - 16.7 microseconds







Total Elapsed Time               16.7 - 30.1 microseconds







The paper concluded that Windows NT was appropriate for use as a real-time system. Basic measurements 3 reported in the paper are listed in the table at left. The primary discrepancy in the overall duration of the event was attributed to effects of virtual memory and, in particular, the cache manager.

2 Brian Catlin, Design of a TCP/IP Router Using Windows NT. Mr. Catlin is a principal at Catlin & Associates in Redondo Beach, CA. The firm's primary business is systems analysis and programming.

3 The system being measured was a Hewlett-Packard XU 5/90 personal computer with one 90 MHz Pentium CPU, 256kb synchronous cache, 16 MB memory and 540 MB of disk space. Measurement test equipment included various Hewlett-Packard systems.


















Previous Page    Home    Next Page